Bayesian Linear Regression

Analysis of Flight Delay Data

Sara Parrish, Heather Anderson (Advisor: Dr. Seals)

Nov 10, 2024

Important

Remember: Your goal is to make your audience understand and care about your findings. By crafting a compelling story, you can effectively communicate the value of your data science project.

Carefully read this template since it has instructions and tips to writing!

More information about revealjs: https://quarto.org/docs/reference/formats/presentations/revealjs.html

Introduction to Bayesian Linear Regression

  • Regression under the frequentist framework
    • Independent variables are used to predict dependent variables
    • Linear regression finds best-fitting line to observed data to make further predictions
      • Regression parameters (\beta) are assumed to be fixed
    • Only collected data is used for approximation
  • Regression under the Bayesian framework
    • Independent variables are used to predict dependent variables
    • Regression parameters (\beta) are not assumed to be fixed
    • Collected data is used alongside prior knowledge for approximation

Introduction

  • Develop a storyline that captures attention and maintains interest.

  • Your audience is your peers

  • Clearly state the problem or question you’re addressing.

  • Introduce why it is relevant needs.

  • Provide an overview of your approach.

Methods

Frequentist vs. Bayesian Approach

  • The Frequentist Approach
    • Typical linear model

Y = \beta_0 + \beta_1X + \varepsilon

  • Y : Dependent variable, the outcome
  • \beta_0 : y intercept
  • \beta_1 : The regression coefficient
  • X : Independent variable
  • \varepsilon : Random error [1]
  • \hat\beta provides a point estimate

Frequentist vs. Bayesian Approach

  • The Bayesian Approach
    • A regression is constructed using probability distributions, not point estimates as in the frequentist approach
    • Bayes Rule (2) is used to inform the model [1]

p(B|A) = \frac{p(A|B)\cdot p(B)}{p(A)}

  • Bayes Rule allows for the calculation of inverse probability (p(B|A) \text{ from } p(A|B))
  • p(B|A) \text{ and } p(A|B) are conditional probabilities
  • p(A) \text{ and } p(B) are marginal probabilities [2]

Frequentist vs. Bayesian Approach

  • The Bayesian Approach
    • Bayesian Inference can be written simply [3]

Posterior = \frac{Likelihood \times Prior}{Normalization}

  • The Prior is model of prior knowledge on the subject
  • The Likelihood is the probability of the data given the prior
  • The Normalization is a constant that ensures the posterior distribution is a valid density function whose integration is equal to 1
  • The Posterior is the probability model that expresses an updated view of the model parameters
    • From the initial parameters of the prior
    • Updated with new data expressed in the likelihood function

Frequentist vs. Bayesian Approach

  • The Bayesian Approach
    • A more formal expression of Bayes Rule applied for continous parameters

\begin{align*} p(\theta|y) =& \frac{ L(\theta|y)p(\theta) }{p(y)}\\ \\ p(\theta|y) \propto & \text{ }L(\theta|y)p(\theta) \end{align*}

  • The normalization constant (p(y) above) ensures the posterior distribution is a valid distribution
    • The posterior density function can be written without this constant
  • The resulting prediction is not a point estimate, but a distribution [4]

The Bayesian Approach

  • The Bayesian Linear Regression Model
    • changes based on:
      • distribution chosen for regression
      • distribution and hyperparameters chosen for priors
    • Our test case models a continuous outcome by a continuous predictor, so we use a Normal model with conjugate normal priors

Role of prior knowledge in shaping predictions

  • Priors can be subjective or objective
    • objective is preferred
  • Noninformative priors can be used when there is not adequate prior knowledge
  • Discounted priors are the result of adjusting a known prior to better reflect the current data[2]

Figure from [2]

Understanding the Bayesian Framework

  • Bayes’ theorem is used to update prior beliefs about model parameters with new data
    • This results in a posterior distribution [4]
  • Posterior distribution vs. point estimates
    • measures the uncertainty in predictions
    • richer picture for predictions
    • better uncertainty quantification [3]

Figure from [2]

Interpreting the Posterior

  • The marginal posterior density function (output) may not be available
  • Makov Chain Monte Carlo is commonly used
    • Markov chain sequence establishes a sample space from the posterior
    • Integration of samples generated through Monte Carlo techniques from the sample space
  • Some popular MCMC algorithms:
    • Gibbs sampler
    • Metropolis-Hastings (MH) [2]

The Model

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \end{align*} Where: - Y_i is the arrival delay for the i-th flight - X_i is the departure delay for the i-th flight - \mu_i = \beta_0 + \beta_1X_i is the local mean arrival delay, , specific to the departure time - \sigma^2 is the variance of the errors - \overset{\text{ind}}{\sim} indicates conditional independence of each arrival delay with the given parameters

Prior Selection

  • Regression parameters
    • Intercept: \beta_0 \sim N(m_0, s^2_0)
    • Slope: \beta_1 \sim N(m_1, s^2_1)
    • Error: \sigma \sim \text{Exp}(l)

The Bayesian Linear Regression Model

The model can be written as

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*}

Tuning Hyperparameters

\begin{align*} \beta_{0c} &\sim N(2, 36^2)\\ \beta_{1} &\sim N(0.02, 0.01^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

The Updated Model

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(2, 36^2)\\ \beta_1 &\sim N(0.02, 0.01^2)\\ \sigma &\sim \text{Exp}(0.02) \end{align*}

Statistical Programming

  • Data was analyzed in R [5] , imported via CSV.
  • Libraries
    • rstanarm [7]
      • stan_glm() function - simulation of model
      • posterior_predict() function - simulation of posterior
    • bayesrules [8]
      • prediction_summary() function - evaluation of posterior

Data Exploration and Visualization

  • Describe your data sources and collection process.

  • Present initial findings and insights through visualizations.

  • Highlight unexpected patterns or anomalies.

Data Exploration and Visualization

A study was conducted to determine how…

Modeling and Results

  • Explain your data preprocessing and cleaning steps.

  • Present your key findings in a clear and concise manner.

  • Use visuals to support your claims.

  • Tell a story about what the data reveals.

The Dataset

Header Description
Fl Date Flight Date (yyyy-mm-dd)
Airline Airline Name
Airline DOT Airline Name and Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years.
Airline Code Unique Carrier Code
DOT Code An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation.
Fl Number Flight Number
Origin Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused.
Origin City Origin City Name, State Code
Dest Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused.
Dest City Destination City Name, State Code
CRS Dep Time CRS Departure Time (local time: hhmm)
Dep Time Actual Departure Time (local time: hhmm)
Dep Delay Difference in minutes between scheduled and actual departure time. Early departures show negative numbers.
Taxi Out Taxi Out Time, in Minutes
Wheels Off Wheels Off Time (local time: hhmm)
Wheels On Wheels On Time (local time: hhmm)
Taxi In Taxi In Time, in Minutes
CRS Arr Time CRS Arrival Time (local time: hhmm)
Arr Time Actual Arrival Time (local time: hhmm)
Arr Delay Difference in minutes between scheduled and actual arrival time. Early arrivals show negative numbers.
Cancelled Cancelled Flight Indicator (1=Yes)
Cancellation Code Specifies The Reason For Cancellation
Diverted Diverted Flight Indicator (1=Yes)
CRS Elapsed Time CRS Elapsed Time of Flight, in Minutes
Actual Elapsed Time Elapsed Time of Flight, in Minutes
Air Time Flight Time, in Minutes
Distance Distance between airports (miles)
Carrier Delay Carrier Delay, in Minutes
Weather Delay Weather Delay, in Minutes
NAS Delay National Air System Delay, in Minutes
Security Delay Security Delay, in Minutes
Late Aircraft Delay Late Aircraft Delay, in Minutes

Table 1

Table 1: Flight Delay Summary by Flight Period
Flight Period
Flight Period
Morning Afternoon Evening Total
TotalFlightsCount 1246031 (41.5%) 1423140 (47.4%) 330829 (11.0%) 3000000 (100%)
CancelledFlightsCount 30690 (38.8%) 38343 (48.4%) 10107 (12.8%) 79140 (100%)
DivertedFlightsCount 2555 (36.2%) 3901 (55.3%) 600 (8.5%) 7056 (100%)
AvgCRSDepTime 08:49:31 15:73:19 20:66:23 13:27:04
AvgDepTime 08:53:58 15:89:05 20:12:40 13:29:47
AvgDepDelay 5.23 12.93 16.51 10.12
AvgTaxiOut 16.87 16.44 16.65 16.64
AvgTaxiIn 7.75 7.78 6.95 7.68
AvgCRSArrTime 10:87:15 17:85:11 17:42:14 14:90:34
AvgArrTime 10:86:01 17:71:56 15:89:47 14:66:31
AvgArrDelay -0.77 7.34 10.04 4.26
AvgAirTime 114.12 109.8 116.31 112.31
CarrierDelayCount 86824 (29.2%) 162266 (54.6%) 47861 (16.1%) 296951 (100%)
SecurityDelayCount 887 (32.1%) 1434 (52.0%) 438 (15.9%) 2759 (100%)
WeatherDelayCount 8380 (26.7%) 18758 (59.7%) 4290 (13.7%) 31428 (100%)
NASDelayCount 80604 (31.4%) 144366 (56.3%) 31507 (12.3%) 256477 (100%)
LateAircraftDelayCount 42721 (16.5%) 168902 (65.2%) 47391 (18.3%) 259014 (100%)
Summary includes morning, afternoon, and evening flight periods.

Data Preprocessing

Notable changes:

  • Time format
  • Day of the week variable
  • Removal of cancelled and diverted flights

Modeling

The Normal Data Model: Departure Time Predictor

\begin{align*} Y_i|\beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_i \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*}

###The Normal Data Model: Week Day Predictor{.scrollable}

\begin{align*} Y_i|\beta_0, \beta_1, ... \beta_6, \sigma &\overset{\text{ind}}{\sim} N (\mu_i, \sigma^2) && \text{with } && \mu_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... \beta_6X_{i6} \\ \beta_{0} &\sim N(m_0, s_0^2)\\ \beta_1 &\sim N(m_1, s_1^2)\\ \sigma &\sim \text{Exp}(l) \end{align*}

MCMC ACF

:::

Results

Table 2.
Estimations of the Posterior Distributions’ Regression Coefficients
Mean SD 95% CI
Continuous


Flat Model
𝛽₀ intercept -10.92 0.47 (-11.85; 0.02)
𝛽₁ Departure Time 0.02 0.00 (0.02; 50.86)
𝜎 51.09 0.12 (50.86; -11.86)


Default Tuned Model
𝛽₀  intercept -10.94 0.47 (-11.86; 0.02)
𝛽₁ Departure Time 0.02 0.00 (0.02; 50.86)
𝜎 51.09 0.12 (50.86; -12.02)


Tuned Model
𝛽₀ intercept -11.66 0.17 (-12.02; 0.02)
𝛽₁ Departure Time 0.02 0.00 (0.02; 50.87)
𝜎 51.10 0.12 (50.87; 0.00)
Categorical







Flat Model
𝛽₀ intercept 1.57 0.44 (0.69; 3.72)
𝛽₁ Wednesday 4.92 0.61 (3.72; 1.47)
𝛽₂ Thursday 2.66 0.65 (1.47; 0.69)
𝛽₃ Friday 1.86 0.62 (0.69; 2.25)
𝛽₄ Saturday 3.43 0.61 (2.25; 3.21)
𝛽₅ Sunday 4.36 0.61 (3.21; 1.72)
𝛽₆ Monday 2.93 0.65 (1.72; 51.16)
𝜎 51.39 0.12 (51.16; 0.73)







Default Tuned Model
𝛽₀ intercept 1.54 0.40 (0.73; 3.21)
𝛽₁ Wednesday 4.40 0.60 (3.21; 1.48)
𝛽₂ Thursday 2.69 0.60 (1.48; 1.75)
𝛽₃ Friday 2.97 0.64 (1.75; 3.77)
𝛽₄ Saturday 4.96 0.59 (3.77; 2.31)
𝛽₅ Sunday 3.48 0.58 (2.31; 0.74)
𝛽₆ Monday 1.89 0.60 (0.74; 51.17)
𝜎 51.40 0.12 (51.17; 0.66)







Tuned Model
𝛽₀ intercept 1.54 0.44 (0.66; 3.76)
𝛽₁ Wednesday 4.96 0.64 (3.76; 1.48)
𝛽₂ Thursday 2.70 0.61 (1.48; 0.71)
𝛽₃ Friday 1.91 0.64 (0.71; 2.28)
𝛽₄ Saturday 3.50 0.63 (2.28; 3.23)
𝛽₅ Sunday 4.41 0.61 (3.23; 1.73)
𝛽₆ Monday 2.99 0.64 (1.73; 51.17)
𝜎 51.39 0.12 (51.17; 0.00)

Comparison of the Models

Table 3.
MAE MAE Scaled Within 50% Within 95%
Continuous Predictor Flat Model 15.730 0.313 0.841 0.966
Default Tuned Model 15.779 0.314 0.840 0.966
Tuned Model 15.668 0.312 0.849 0.966
Categorical Predictor Flat Model 17.110 0.338 0.866 0.965
Default Tuned Model 17.080 0.338 0.866 0.966
Tuned Model 17.118 0.339 0.867 0.966
Posterior predictive results from cross validation.

Conclusion

  • Summarize your key findings.

  • Discuss the implications of your results.

References

[1]
X. Yan and X. G. Su, Linear regression analysis: Theory and computing. Singapore: World Scientific Publishing, 2009. Available: https://ebookcentral.proquest.com/lib/uwf/reader.action?docID=477274&ppg=318&pq-origsite=primo
[2]
E. Lesaffre and A. B. Lawson, Bayesian biostatistics, 1st ed. Somerset: John Wiley & Sons, Ltd, 2012. doi: https://doi.org/10.1002/9781119942412.
[3]
W. Koehrsen, “Introduction to bayesian linear regression.” https://towardsdatascience.com/introduction-to-bayesian-linear-regression-e66e60791ea7, Apr. 2018.
[4]
T. Bayes, “An essay towards solving a problem in the doctrine of chances. 1763,” 1763.
[5]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2023. Available: https://www.R-project.org/
[6]
P. Zelazko, “Flight delay and cancellation dataset (2019-2023).” https://www.kaggle.com/datasets/patrickzel/flight-delay-and-cancellation-dataset-2019-2023., Nov. 2023.
[7]
S. Brilleman, M. Crowther, M. Moreno-Betancur, J. Buros Novik, and R. Wolfe, “Joint longitudinal and time-to-event models via Stan.” 2018. Available: https://github.com/stan-dev/stancon_talks/
[8]
M. Dogucu, A. Johnson, and M. Ott, Bayesrules: Datasets and supplemental functions from bayes rules! book. 2021. Available: https://github.com/bayes-rules/bayesrules